3,894 research outputs found

    Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers

    Full text link
    Following the major success of neural language models (LMs) such as BERT or GPT-2 on a variety of language understanding tasks, recent work focused on injecting (structured) knowledge from external resources into these models. While on the one hand, joint pretraining (i.e., training from scratch, adding objectives based on external knowledge to the primary LM objective) may be prohibitively computationally expensive, post-hoc fine-tuning on external knowledge, on the other hand, may lead to the catastrophic forgetting of distributional knowledge. In this work, we investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus, respectively, using adapter training. While overall results on the GLUE benchmark paint an inconclusive picture, a deeper analysis reveals that our adapter-based models substantially outperform BERT (up to 15-20 performance points) on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS

    Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs

    Get PDF
    Recent graph-to-text models generate text from graph-based data using either global or local aggregation to learn node representations. Global node encoding allows explicit communication between two distant nodes, thereby neglecting graph topology as all nodes are directly connected. In contrast, local node encoding considers the relations between neighbor nodes capturing the graph structure, but it can fail to capture long-range relations. In this work, we gather both encoding strategies, proposing novel neural models which encode an input graph combining both global and local node contexts, in order to learn better contextualized node embeddings. In our experiments, we demonstrate that our approaches lead to significant improvements on two graph-to-text datasets achieving BLEU scores of 18.01 on AGENDA dataset, and 63.69 on the WebNLG dataset for seen categories, outperforming state-of-the-art models by 3.7 and 3.1 points, respectively.Comment: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2020; Author's final version; pre-MIT Press publication versio

    Investigating Pretrained Language Models for Graph-to-Text Generation

    Full text link
    Graph-to-text generation aims to generate fluent texts from graph-based data. In this paper, we investigate two recently proposed pretrained language models (PLMs) and analyze the impact of different task-adaptive pretraining strategies for PLMs in graph-to-text generation. We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs. We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further. In particular, we report new state-of-the-art BLEU scores of 49.72 on LDC2017T10, 59.70 on WebNLG, and 25.66 on AGENDA datasets - a relative improvement of 31.8%, 4.5%, and 42.4%, respectively. In an extensive analysis, we identify possible reasons for the PLMs' success on graph-to-text tasks. We find evidence that their knowledge about true facts helps them perform well even when the input graph representation is reduced to a simple bag of node and edge labels.Comment: Our code and pretrained model checkpoints are available at https://github.com/UKPLab/plms-graph2tex

    Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs

    Full text link
    We present Graformer, a novel Transformer-based encoder-decoder architecture for graph-to-text generation. With our novel graph self-attention, the encoding of a node relies on all nodes in the input graph - not only direct neighbors - facilitating the detection of global patterns. We represent the relation between two nodes as the length of the shortest path between them. Graformer learns to weight these node-node relations differently for different attention heads, thus virtually learning differently connected views of the input graph. We evaluate Graformer on two popular graph-to-text generation benchmarks, AGENDA and WebNLG, where it achieves strong performance while using many fewer parameters than other approaches

    Saddles in the energy landscape probed by supercooled liquids

    Full text link
    We numerically investigate the supercooled dynamics of two simple model liquids exploiting the partition of the multi-dimension configuration space in basins of attraction of the stationary points (inherent saddles) of the potential energy surface. We find that the inherent saddles order and potential energy are well defined functions of the temperature T. Moreover, decreasing T, the saddle order vanishes at the same temperature (T_MCT) where the inverse diffusivity appears to diverge as a power law. This allows a topological interpretation of T_MCT: it marks the transition from a dynamics between basins of saddles (T>T_MCT) to a dynamics between basins of minima (T<T_MCT).Comment: 4 pages, 3 figures, to be published on PR

    Relaxation processes in harmonic glasses?

    Full text link
    A relaxation process, with the associated phenomenology of sound attenuation and sound velocity dispersion, is found in a simulated harmonic Lennard-Jones glass. We propose to identify this process with the so called microscopic (or instantaneous) relaxation process observed in real glasses and supercooled liquids. A model based on the memory function approach accounts for the observation, and allows to relate to each others: 1) the characteristic time and strength of this process, 2) the low frequency limit of the dynamic structure factor of the glass, and 3) the high frequency sound attenuation coefficient, with its observed quadratic dependence on the momentum transfer.Comment: 11 pages, 3 figure

    Elastic constant dishomogeneity and Q2Q^2 dependence of the broadening of the dynamical structure factor in disordered systems

    Full text link
    We propose an explanation for the quadratic dependence on the momentum QQ, of the broadening of the acoustic excitation peak recently found in the study of the dynamic structure factor of many real and simulated glasses. We ascribe the observed Q2Q^2 law to the spatial fluctuations of the local wavelength of the collective vibrational modes, in turn produced by the dishomegeneity of the inter-particle elastic constants. This explanation is analitically shown to hold for 1-dimensional disordered chains and satisfatorily numerically tested in both 1 and 3 dimensions.Comment: 4 pages, RevTeX, 5 postscript figure

    Moisture Control, Inoculant and Particle Size in Tropical Grass Silages

    Get PDF
    Decreased fermentation and spoilage losses with improved aerobic stability during feed out can be accomplished by several strategies, such as wilting, addition of microbial additives and moisture absorbents. Particle size reduction may increase bulk density and improve the fermentation. The objective of this trial was to evaluate the effects of particle size, moisture content and a microbial additive on chemical-physical parameters and losses in silages made from Tanzania grass

    Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning

    Get PDF
    Long-horizon task planning is essential for the development of intelligent assistive and service robots. In this work, we investigate the applicability of a smaller class of large language models (LLMs), specifically GPT-2, in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially. Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans, thereby learning to reason over long-horizon tasks, as encountered in the ALFRED benchmark. We compare our approach with classical planning and baseline methods to examine the applicability and generalizability of LLM-based planners. Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics
    • …
    corecore